Search results for "circular word"
showing 9 items of 9 documents
Alignment-free sequence comparison using absent words
2018
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…
Marked systems and circular splicing
2007
Splicing systems are generative devices of formal languages, introduced by Head in 1987 to model biological phenomena on linear and circular DNA molecules. In this paper we introduce a special class of finite circular splicing systems named marked systems. We prove that a marked system S generates a regular circular language if and only if S satisfies a special (decidable) property. As a consequence, we show that we can decide whether a regular circular language is generated by a marked system and we characterize the structure of these regular circular languages.
Suffixes, Conjugates and Lyndon Words
2013
In this paper we are interested in the study of the combinatorial aspects connecting three important constructions in the field of string algorithms: the suffix array, the Burrows-Wheeler transform (BWT) and the extended Burrows-Wheeler transform (EBWT). Such constructions involve the notions of suffixes and conjugates of words and are based on two different order relations, denoted by $\plex$ and $\pom$, that, even if strictly connected, are quite different from the computational point of view. In this study an important role is played by Lyndon words. In particular, we improve the upper bound on the number of symbol comparisons needed to establish the $\pom$ order between two primitive wo…
A characterization of regular circular languages generated by marked splicing systems
2009
AbstractSplicing systems are generative devices of formal languages, introduced by Head in 1987 to model biological phenomena on linear and circular DNA molecules. A splicing system is defined by giving an initial set I and a set R of rules. Some unanswered questions are related to the computational power of circular splicing systems. In particular, a still open question is to find a characterization of circular languages generated by finite circular splicing systems (i.e., circular splicing systems with both I and R finite sets). In this paper we introduce a special class of the latter systems named marked systems. We prove that a marked system S generates a regular circular language if an…
On the regularity of circular splicing languages : A survey and new developments
2009
Circular splicing has been introduced to model a specific recombinant behaviour of circular DNA, continuing the investigation initiated with linear splicing. In this paper we focus on the relationship between regular circular languages and languages generated by finite circular splicing systems. We survey the known results towards a characterization of the intersection between these two classes and provide new contributions on the open problem of finding this characterization. First, we exhibit a non-regular circular language generated by a circular simple system thus disproving a known result in this area. Then we give new results related to a restrictive class of circular splicing systems…
SORTING CONJUGATES AND SUFFIXES OF WORDS IN A MULTISET
2014
In this paper we are interested in the study of the combinatorial aspects related to the extension of the Burrows-Wheeler transform to a multiset of words. Such study involves the notion of suffixes and conjugates of words and is based on two different order relations, denoted by <lex and ≺ω, that, even if strictly connected, are quite different from the computational point of view. In particular, we introduce a method that only uses the <lex sorting among suffixes of a multiset of words in order to sort their conjugates according to ≺ω-order. In this study an important role is played by Lyndon words. This strategy could be used in applications specially in the field of Bioinformatic…
Linear-time sequence comparison using minimal absent words & applications
2016
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this article, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in…
Minimal forbidden factors of circular words
2017
Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language $M$, computes a DFA recognizing the language whose set of minimal forbidden factors is $M$. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word.…
Minimal Forbidden Factors of Circular Words
2017
Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al.ÃÂ [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language M, computes a DFA of the language having M as set of minimal forbidden factors. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word. We gener…